PERL 4.0 Reference Guide

[<<Previous Entry] [^^Up^^] [Next Entry>>] [Menu] [About The Guide]
     study(SCALAR)

     study SCALAR

     study   Takes extra time to study SCALAR ($_ if unspecified)
             in anticipation of doing many pattern matches on the
             string before it is next modified.  This may or  may
             not save time, depending on the nature and number of
             patterns you are searching on, and on the  distribu-
             tion  of  character  frequencies in the string to be
             searched--you probably want to compare runtimes with
             and  without  it  to  see  which runs faster.  Those
             loops which scan for  many  short  constant  strings
             (including  the  constant parts of more complex pat-
             terns) will benefit most.  You  may  have  only  one
             study  active  at  a  time--if you study a different
             scalar the first is  "unstudied".   (The  way  study
             works  is  this: a linked list of every character in
             the string to be searched is made, so we  know,  for
             example,  where  all  the  'k' characters are.  From
             each  search  string,  the   rarest   character   is
             selected, based on some static frequency tables con-
             structed from some  C  programs  and  English  text.
             Only those places that contain this "rarest" charac-
             ter are examined.)

             For example, here is a loop which inserts index pro-
             ducing  entries before any line containing a certain
             pattern:

                  while (<>) {
                       study;
                       print ".IX foo\n" if /\bfoo\b/;
                       print ".IX bar\n" if /\bbar\b/;
                       print ".IX blurfl\n" if /\bblurfl\b/;
                       ...
                       print;
                  }

             In searching for /\bfoo\b/, only those locations  in
             $_  that  contain 'f' will be looked at, because 'f'
             is rarer than 'o'.  In general, this is  a  big  win
             except  in pathological cases.  The only question is
             whether it saves you more time than it took to build
             the linked list in the first place.

             Note that if you have to look for strings  that  you
             don't  know  till  runtime,  you can build an entire
             loop as a string and eval that to avoid  recompiling
             all  your  patterns  all  the  time.   Together with
             undefining $/ to input entire files as  one  record,
             this can be very fast, often faster than specialized
             programs like fgrep.  The following scans a list  of
             files  (@files)  for  a  list of words (@words), and
             prints out the names of those files that  contain  a
             match:

                  $search = 'while (<>) { study;';
                  foreach $word (@words) {
                      $search .= "++\$seen{\$ARGV} if /\b$word\b/;\n";
                  }
                  $search .= "}";
                  @ARGV = @files;
                  undef $/;
                  eval $search;       # this screams
                  $/ = "\n";          # put back to normal input delim
                  foreach $file (sort keys(%seen)) {
                      print $file, "\n";
                  }
This page created by ng2html v1.05, the Norton guide to HTML conversion utility. Written by Dave Pearson